It has already been over a decade since I started writing codes. Writing the code has always been fun. During those time, the code always threw me challenges for me to solve. As time goes, there are few things I learned about writing codes more efficiently. Also, how it is not always about writing the efficient code, but more about balances between writing efficient code vs writing organized code through various DPs and readable code. I summarized below for few performance mistakes we can make as a software developer. I have already made a couple of these mistakes on the course of becoming a programmer, they were good experiences for me and taught me many lessons.
I have seen multiple times where multiple DB calls are problematic. First, you need to know that making a DB connection for each DB call quite costly. You should try to reuse the connection you have opened initially as much as you can. Second, if you are inserting a bunch of rows, try to insert everything with one call instead of calling “insert” multiple times inside loops. This is a no brainer. Say if you have a single query sent over to DB which would execute these sequence of tasks: 1) query sent 2) parsing query 3) insert row 4) index inserted. Now, let’s say all of this takes about 10ms which is relatively fast. We now want to insert 100 rows and places these insert calls inside a loop which iterates 100 times with 10ms = 1000ms. Now we send over one bulk insert which would insert those 100 rows with tasks of 1) query sent 2) parsing query 3) insert row 4) index inserted. Inserting rows probably takes longer with bulk insert, however, that overhead is relatively small compared to executing these inserts in a row. I don’t think a lot of you do SELECTs in a loop with multiple calls. Don’t do this for inserts either.
A lot of people ignore “software update alerts” which may include some significant performance fixes or fixes to security vulnerabilities. I always prefer to stay in the latest version whether it is simple OS updates down to a small library dependency update for the software. There can always be a situation where we cannot upgrade to a later version. Upon discovery of this situation, we should consider fixing this reason right away and do the upgrade. If the upgrade is breaking things, you should consider moving away from the solution entirely and move onto a better solution for the future.
This one is obvious. You need to know how each data structures are designed and how they are used for and use appropriately. For example, a “List” can contain items which may be duplicated items although searching for a specific item isn’t all that pretty. However, the “Set” data structure contains unique items (no duplicates), but the searching and accessing the item is instant (thanks to hash code) where you will see the benefit especially when the size of this data structure is large. This is just one example. There are benefits of using Array, List, Stack/Queue, Tree, Bits etc. Knowing what data structure to use in a certain situation is the key to a successful software.
You should not use the threads for performance boosts. Threads should be considered when the application requires doing the threading (such as web server hosting to multiple people). It doesn’t add much value for a small program to be multi-threaded. Threads are hard to program, not to mention the pain that comes in debugging with threads. As a matter of fact, events are processed faster than having all these complicated threads because there will be no locking overheads and no context switching between threads. So if threads have not much value in your program, it is best to just avoid. Remember threads does not always mean the optimal and efficient solution and they should not be used to boost up the performance.
For a normal RPC (the web especially) protocol, where the server is communicating with a browser (client), JSON and XML work well because the message is interchanged by the server and client with “human-readable” resources so it is easy to debug for humans as well. However, if the client was a machine instead of a browser, there is no need for this human visibility for interchanging the data. So we can further reduce the overhead of encoding the messages. We should consider streaming the data in binary. Some binary streaming RPC protocols I have come across using include Google Protocol Buffers, Apache Thrift, ZeroC Ice. They were all well supported with great documentations/examples.
This is surprisingly common in a lot of code. The worst I have seen so far is a call to DB to SELECT elements without really making any usage out of it. If the client is executing this part of the code every now and then, this is adding another few milliseconds to the client unnecessarily. The best way for me to work with these is just to read the function from top to bottom and ask myself if calls make sense. Modern IDE will actually nudge you that the certain variables are not being used. Another great reason to stay with the Modern IDE.
A problem can be simplified by applying the right mathematic formulae. Using the formula itself or even just simple square roots or min/max can save up a lot of time instead of having multiple loops with if/else statements. When the problem sounds very mathematical, consider applying mathematic rules first before you dive into writing code right away.
The topic of scalability is rising ever more as the data grows over time. When it comes to the scalability, I’d like to think small first even on a large-scale project (like putting everything inside my computer first) and breakdown from there. The one most important thing to remember about scalability is the “space to be maintained reasonable performance”. Think about what it means to be a “scalable system”. The question to ask is whether the system we designed going to hold the data (or whatever items) we initially hoped to be. The performance becomes very critical as we are maintaining that scalable system because data is to keep growing yet we don’t want the performance to degrade. We should consider solutions such as sharding, SQL or NoSQL, distributing, big data, indexing, caching etc. There are so many techniques to maintain the good performance.